Grammar Compression by Induced Suffix Sorting

نویسندگان

چکیده

A grammar compression algorithm, called GCIS, is introduced in this work. GCIS based on the induced suffix sorting algorithm SAIS, presented by Nong et al. 2009. The proposed solution builds factorization performed SAIS during sorting. context-free used to replace factors non-terminals. then recursively applied shorter sequence of resulting encoded exploiting some redundancies, such as common prefixes between right-hands rules, sorted according SAIS. excels for its low space and time required while obtaining competitive ratios. Our experiments regular repetitive, moderate very large texts, show that stands a convenient choice compared well-known compressors Gzip 7-Zip; RePair gold standard compression; recent SOLCA, LZRR, LZD. In exchange, slow at decompressing. Yet, are more than Lempel-Ziv one can access text substrings directly compressed form without ever decompressing text. We demonstrate an excellent candidate scenario, because it shows be among alternatives. also relation with makes good intermediate structure build array LCP decompression

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Grammar Compression Algorithm based on Induced Suffix Sorting

We introduce GCIS, a grammar compression algorithm based on the induced suffix sorting algorithm SAIS, presented by Nong et al. in 2009. Our solution builds on the factorization performed by SAIS during suffix sorting. We construct a context-free grammar on the input string which can be further reduced into a shorter string by substituting each substring by its corresponding factor. The resulti...

متن کامل

Unifying Text Search and Compression {suffix Sorting, Block Sorting and Suffix Arrays{ Title: Associate Professor of Information Science

Today many electronic documents are available such as articles of newspapers, dictionaries, books, DNA sequences, etc. and they are stored in databases. We also have many documents on the Internet and have many e-mail documents. Therefore, fast queries on such huge amount of documents and their compression to reduce costs for storing or transferring them are important. In this thesis, a uni ed ...

متن کامل

In-Place Suffix Sorting

Given string T = T [1, . . . , n], the suffix sorting problem is to lexicographically sort the suffixes T [i, . . . , n] for all i. This problem is central to the construction of suffix arrays and trees with many applications in string processing, computational biology and compression. A bottleneck in these applications is the amount of workspace needed to perform suffix sorting beyond the spac...

متن کامل

Faster suffix sorting

We propose a fast and memory efficient algorithm for lexicographically sorting the suffixes of a string, a problem that has important applications in data compression as well as string matching. Our algorithm eliminates much of the overhead of previous specialized approaches while maintaining their robustness for all kinds of input. For input size n, our algorithm operates in only two integer a...

متن کامل

Notes on Suffix Sorting

We study the problem of lexicographically sorting the suffixes of a string of symbols. In particular, we analyze the time complexity of Sadakane’s suffix sorting algorithm [8], showing that this is O(n log n) in the worst case. We also give a small improvement in the space requirements of this algorithm. We conclude that Sadakane’s algorithm, which has previously been shown to outperform the cl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Journal of Experimental Algorithms

سال: 2022

ISSN: ['1084-6654']

DOI: https://doi.org/10.1145/3549992